智能论文笔记

Learning What You Should Learn

Shitong Shao , Huanran Chen , Zhen Huang , Linrui Gong , Shuai Wang , Xinxiao Wu

分类：计算机视觉

2022-12-11

In real teaching scenarios, an excellent teacher always teaches what he (or she) is good at but the student is not. This method gives the student the best assistance in making up for his (or her) weaknesses and becoming a good one overall. Enlightened by this, we introduce the approach to the knowledge distillation framework and propose a data-based distillation method named ``Teaching what you Should Teach (TST)''. To be specific, TST contains a neural network-based data augmentation module with the priori bias, which can assist in finding what the teacher is good at while the student are not by learning magnitudes and probabilities to generate suitable samples. By training the data augmentation module and the generalized distillation paradigm in turn, a student model that has excellent generalization ability can be created. To verify the effectiveness of TST, we conducted extensive comparative experiments on object recognition (CIFAR-100 and ImageNet-1k), detection (MS-COCO), and segmentation (Cityscapes) tasks. As experimentally demonstrated, TST achieves state-of-the-art performance on almost all teacher-student pairs. Furthermore, we conduct intriguing studies of TST, including how to solve the performance degradation caused by the stronger teacher and what magnitudes and probabilities are needed for the distillation framework.

translated by 谷歌翻译

Reconstructing Hand-Held Objects from Monocular Video

Di Huang , Xiaopeng Ji , Xingyi He , Jiaming Sun , Tong He , Qing Shuai , Wanli Ouyang , Xiaowei Zhou

分类：计算机视觉

2022-11-30

This paper presents an approach that reconstructs a hand-held object from a monocular video. In contrast to many recent methods that directly predict object geometry by a trained network, the proposed approach does not require any learned prior about the object and is able to recover more accurate and detailed object geometry. The key idea is that the hand motion naturally provides multiple views of the object and the motion can be reliably estimated by a hand pose tracker. Then, the object geometry can be recovered by solving a multi-view reconstruction problem. We devise an implicit neural representation-based method to solve the reconstruction problem and address the issues of imprecise hand pose estimation, relative hand-object motion, and insufficient geometry optimization for small objects. We also provide a newly collected dataset with 3D ground truth to validate the proposed approach.

translated by 谷歌翻译

1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results

Benjamin Kiefer , Matej Kristan , Janez Perš , Lojze Žust , Fabio Poiesi , Fabio Augusto de Alcantara Andrade , Alexandre Bernardino , Matthew Dawkins , Jenni Raitoharju , Yitong Quan

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2022-11-24

The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.

translated by 谷歌翻译

Longitudinal Prediction of Postnatal Brain Magnetic Resonance Images via a Metamorphic Generative Adversarial Network

Yunzhi Huang , Sahar Ahmad , Luyi Han , Shuai Wang , Zhengwang Wu , Weili Lin , Gang Li , Li Wang , Pew-Thian Yap

分类：计算机视觉

2022-08-09

由于受试者辍学或扫描失败，在纵向研究中不可避免地扫描是不可避免的。在本文中，我们提出了一个深度学习框架，以预测获得的扫描中缺少扫描，从而迎合纵向婴儿研究。由于快速的对比和结构变化，特别是在生命的第一年，对婴儿脑MRI的预测具有挑战性。我们引入了值得信赖的变质生成对抗网络（MGAN），用于将婴儿脑MRI从一个时间点转换为另一个时间点。MGAN具有三个关键功能：（i）图像翻译利用空间和频率信息以进行详细信息提供映射；（ii）将注意力集中在具有挑战性地区的质量指导学习策略。（iii）多尺度杂种损失函数，可改善组织对比度和结构细节的翻译。实验结果表明，MGAN通过准确预测对比度和解剖学细节来优于现有的gan。

translated by 谷歌翻译

Aesthetic Attributes Assessment of Images with AMANv2 and DPC-CaptionsV2

Xinghui Zhou , Xin Jin , Jianwen Lv , Heng Huang , Ming Mao , Shuai Cui

分类：计算机视觉

2022-08-09

图像美学质量评估在过去十年中很受欢迎。除数值评估外，还提出了自然语言评估（美学字幕）来描述图像的一般美学印象。在本文中，我们提出了美学属性评估，即审美属性字幕，即评估诸如组成，照明使用和颜色布置之类的美学属性。标记美学属性的注释是一项非平凡的任务，该评论限制了相应数据集的规模。我们以半自动方式构建了一个名为DPC-CAPTIONSV2的新型数据集。知识从带有完整注释的小型数据集转移到摄影网站的大规模专业评论。 DPC-CAPTIONSV2的图像包含最多4个美学属性的注释：组成，照明，颜色和主题。然后，我们根据BUTD模型和VLPSA模型提出了一种新版本的美学多属性网络（AMANV2）。 AMANV2融合了带有完整注释的小规模PCCD数据集和带有完整注释的大规模DPCCAPTIONSV2数据集的混合物的功能。 DPCCAPTIONSV2的实验结果表明，我们的方法可以预测对4种美学属性的评论，这些评论比上一个Aman模型所产生的方法更接近美学主题。通过图像字幕的评估标准，专门设计的AMANV2模型对CNN-LSTM模型和AMAN模型更好。

translated by 谷歌翻译

SLAM-TKA: Real-time Intra-operative Measurement of Tibial Resection Plane in Conventional Total Knee Arthroplasty

Shuai Zhang , Liang Zhao , Shoudong Huang , Hua Wang , Qi Luo , Qi Hao

分类：机器人

2022-08-08

总膝关节置换术（TKA）是一种常见的骨科手术，可以用人造植入物代替受损的膝关节。实现计划的植入物位置的不准确性可能会导致植入物成分无菌释放，磨损甚至是联合修订的风险，并且大多数情况下发生在常规夹具TKA的胫骨一侧（con- con- con- con- con- TKA）。这项研究旨在精确评估胫骨近端切除面的准确性，以实时术中内部，以使评估处理在Con-TKA手术过程中的变化很小。在近端胫骨切除阶段捕获的两张X射线X线射线照相以及从计算机断层扫描（CT）扫描分割的术前患者特异性胫骨3D网格模型和trocar Pin 3D网格模型用于拟议的同时定位和映射（ SLAM）系统以估计胫骨近端切除平面。使用模拟和体内数据集进行验证，以证明所提出算法的鲁棒性和潜在临床值。

translated by 谷歌翻译

Robust Quantitative Susceptibility Mapping via Approximate Message Passing

Shuai Huang , James J. Lah , Jason W. Allen , Deqiang Qiu

分类：计算机视觉

2022-07-29

目的：在存在相误差的情况下恢复QSM一直具有挑战性，这可能是由于脑出血和钙化病例的噪声或局部易感性变化引起的。我们为QSM提出了贝叶斯公式，其中使用两个组分的高斯混合分布来对长尾噪声（误差）分布进行建模，并设计具有自动和适应性参数估计的近似消息传递（AMP）算法。理论：敏感性图的小波系数遵循拉普拉斯分布。测量噪声遵循两个组分的高斯混合分布，其中第二高斯组件对噪声异常值进行了建模。分布参数被视为未知变量，并使用AMP共同恢复了易感性。方法：分别将具有参数估计的AMP与最新的非线性L1-QSM和MEDI方法进行比较，分别采用了L1-norm和L2-norm数据输入项。这三种方法对来自QSM挑战2.0的SIM2SNR1数据进行了测试，即健康和出血扫描中的体内数据。结果：在模拟的SIM2SNR1数据集上，AMP-PE达到了最低的NRMSE和SSIM，MEDI达到了最低的HFEN，并且在各种本地评估指标方面，每种方法都具有自己的强大诉讼。在体内数据集上，AMP-PE比L1-QSM和MEDI更好地保存结构细节和删除条纹伪像。结论：通过利用定制的高斯混合噪声，AMP-PE可以在涉及出血和钙化的具有挑战性的QSM病例上取得更好的性能。它配备了内置参数估计，从而避免了体内重建的通常视觉微调步骤的主观偏差。

translated by 谷歌翻译

Density-Aware Personalized Training for Risk Prediction in Imbalanced Medical Data

Zepeng Huo , Xiaoning Qian , Shuai Huang , Zhangyang Wang , Bobak Mortazavi

分类：机器学习

2022-07-23

由于大多数入院的患者生存，因此感兴趣的医疗事件（例如死亡率）通常以较低的速度发生。具有这种不平衡率（类密度差异）的训练模型可能会导致次优预测。传统上，这个问题是通过临时方法（例如重新采样或重新加权）来解决的，但在许多情况下的性能仍然有限。我们为此不平衡问题提出了一个培训模型的框架：1）我们首先将特征提取和分类过程分离，分别调整每个组件的训练批次，以减轻由类密度差异引起的偏差；2）我们既有密度感知的损失，又是错误分类的可学习成本矩阵。我们证明了模型在现实世界医学数据集（TOPCAT和MIMIC-III）中的改进性能，以显示与域中的基线相比，AUC-ROC，AUC-PRC，BRIER技能得分的改进。

translated by 谷歌翻译

Orthogonal Matrix Retrieval with Spatial Consensus for 3D Unknown-View Tomography

Shuai Huang , Mona Zehni , Ivan Dokmanić , Zhizhen Zhao

分类：计算机视觉

2022-07-06

未知视图断层扫描（UVT）从其2D投影以未知的随机取向重建了3D密度图。从Kam（Kam（1980））开始的工作线采用了具有旋转不变的傅立叶特征的矩（MOM）方法，可以在频域中求解UVT，假设方向是均匀分布的。这项工作系列包括基于矩阵分解的最新正交矩阵检索（OMR）方法，虽然优雅地需要有关无法可用的密度的侧面信息，或者无法充分强大。为了使OMR摆脱这些限制，我们建议通过要求它们相互一致来共同恢复密度图和正交矩阵。我们通过deno的参考投影和非负约束来使所得的非凸优化问题正常。这是通过空间自相关功能的新闭合表达式启用的。此外，我们设计了一个易于计算的初始密度图，可有效地降低重建问题的非凸性。实验结果表明，在典型的3D UVT的典型低SNR场景中，具有空间共识的拟议的OMR比以前最新的OMR方法更好。

translated by 谷歌翻译

GAMMA Challenge:Glaucoma grAding from Multi-Modality imAges

Junde Wu , Huihui Fang , Fei Li , Huazhu Fu , Fengbin Lin , Jiongcheng Li , Lexing Huang , Qinji Yu , Sifan Song , Xinxing Xu

分类：计算机视觉

2022-02-14

Color fundus photography and Optical Coherence Tomography (OCT) are the two most cost-effective tools for glaucoma screening. Both two modalities of images have prominent biomarkers to indicate glaucoma suspected. Clinically, it is often recommended to take both of the screenings for a more accurate and reliable diagnosis. However, although numerous algorithms are proposed based on fundus images or OCT volumes in computer-aided diagnosis, there are still few methods leveraging both of the modalities for the glaucoma assessment. Inspired by the success of Retinal Fundus Glaucoma Challenge (REFUGE) we held previously, we set up the Glaucoma grAding from Multi-Modality imAges (GAMMA) Challenge to encourage the development of fundus \& OCT-based glaucoma grading. The primary task of the challenge is to grade glaucoma from both the 2D fundus images and 3D OCT scanning volumes. As part of GAMMA, we have publicly released a glaucoma annotated dataset with both 2D fundus color photography and 3D OCT volumes, which is the first multi-modality dataset for glaucoma grading. In addition, an evaluation framework is also established to evaluate the performance of the submitted methods. During the challenge, 1272 results were submitted, and finally, top-10 teams were selected to the final stage. We analysis their results and summarize their methods in the paper. Since all these teams submitted their source code in the challenge, a detailed ablation study is also conducted to verify the effectiveness of the particular modules proposed. We find many of the proposed techniques are practical for the clinical diagnosis of glaucoma. As the first in-depth study of fundus \& OCT multi-modality glaucoma grading, we believe the GAMMA Challenge will be an essential starting point for future research.

translated by 谷歌翻译